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Abstract: Two separate statistical tests are described and developed in order to test un- 
binned data sets for adherence to the power-law form. The first test employs the TP-statistic, 
a function defined to deviate from zero when the sample deviates from the power-law form, 
regardless of the value of the power index. The second test employs a likelihood ratio test to 
reject a power-law background in favor of a model signal distribution with a cut-off. 
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Introduction and Formalism 

The question of whether the cosmic ray energy 
spectrum exhibits a cut-off at the very high- 
est energies is of central interest to the cos- 
mic ray (CR) physics[4, 10]. The flux of CR's 
at these energies is very small - about 3/km 2 
steradian century - and, therefore, statistical 
analysis techniques which clearly quantify ones 
knowledge of flux suppression are useful. In 
this note we apply the statistics first developed 
for binned CR data sets in [6] to an un-binncd 
analysis. We also introduce a new test based 
on a likelihood ratio test and show that both 
statistics can quantify our knowledge of a flux 
suppression. 

We first establish the mathematical founda- 
tions of the analysis. The CR flux follows 
a power-law for over 10 orders of magni- 
tude. The fundamental probability distribu- 
tion function (p.d.f.) governing the power- 
law assumption (normalized such that ()x = 

1^ f*( X 'i Xrnin,l)dx = 1) VS 



fx{^'i^min:^f) — Ax 
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where A = (7 — l)a^ n and the parameter 7 is 
referred to as the spectral index. 

The n th raw moment of this distribution 
diverges [8] for n > 2 with 7 < 3. Alterna- 
tively, the expected value of \n(x/x m i n ) is bet- 
ter behaved and offers a crucial result of this 



analysis. Analytically we find, 
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In eq.3 we denote the sorted (from least to 
greatest) data set as {X^,X^ 2 ), ■ ■ ■ ,X^}. 
To apply these statistics to an un-binncd data 
set we calculate i> n (X^) for each minimum 

x uy 

We also study a toy p.d.f. which is designed to 
mimic a power-law up to a certain energy but 
then exhibit a sharp "Fermi-Dirac like" cut-off 
above that energy [6]. We follow the parame- 
terization used in [2], 
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where B is chosen such that f FD is normalized 
over the interval [o; m j n ,oo), i.e. () FD = 1. 

Binned vs Un-binned Spectral- 
Index Estimators 

Under the power-law assumption, we can take 
the log of both sides of eq.l to yield log/ x = 
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7 = 2.8, N Events per Set = 3500~ | 
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Figure 1: Estimates of the log-binned (71b) and 
the un-binned (%b)- for 10 5 Monte-Carlo tri- 
als. For each trail we draw N — 3500 events 
from a power-law with 7 = 2.75. 

log((7 - l)/x m in) -~f\og(x/x min ). The slope, 
7ib, of the line which results in the minimum 
X 2 fit to the logarithmically binned ( "LB" ) his- 
togram of a particular data. The un-binned 
maximum likelihood ("ub") estimate of the 
spectral index can be found analytically [8]: 

% b (X (j) ) = l + l/i> 1 (X {j) ). (5) 

This estimator is within 1% of the true 7 for 
N > 100 and it is asymptotically unbiased. 
The variance of this estimator is within 1% of 
the Cramer-Rao lower bound, given by 07 > 
(7 - 1)/VN, for[7] N > 100. As derived in 
[5], we write the asymptotic p.d.f. of 7 ub as 

/ ut ,(7ub;N,7)- 

To illustrate the benefits of using un-binned 
estimators 10 5 Monte-Carlo trials were con- 
ducted. For each trail we draw N = 3500 
events from a power- law with 7 = 2.75 (x m i n = 
1) and calculate j lb and j ub . These numbers 
are chosen to be approximately consistent with 
the flux rcportcd[l] by the Auger Collaboration 
at ICRC 2005, as studied in [6]. In Figure 1 
we plot histograms of these estimators and we 
note that the analytic prediction (f ub is not a 
"fit" ) represents a good approximation for the 
distribution of 7 lb . The mean (over the tri- 
als) of 71b is 2.76 with deviation 0.045 while 
the corresponding values for % b are 2.75 and 
0.030, verifying that 7 ub has smaller error and 
less bias [3] than 7 lb . Since we use 



TP-statistic 

We define the TP-statistic to be, 

r = vj-is 2 /2 = (6) 
f(X {3) ) = vl{X (3) )- l -u 2 (X {j) ). (7) 

The utility of using this statistic comes from 
the fact[9] that eq.6 is zero and thus, eq.7 will 
tend to zero as N — > 00, regardless of the value 
of 7. 

We may approximate the asymptotic joint dis- 
tribution of v\ and v 2 as a bivariate Gaussian 
fv- i v 3 iy\i v<x) with known means, variances and 
correlation coefficient [5]. Thus, for a given N 
and 7, we calculate the p.d.f. of r to be, 

/oo 
f VlV2 (t,2(t 2 - r))dt. (8) 
-00 

The analytic "location" (r) TP and "shape" 
(<7 T ) TP = y/ (t' 2 ) tp — (j)%p parameters of this 
distribution are consistent with simulation gen- 
erated values. Since the numeric integration 
required to calculate these quantities can be 
carried out faster than the requisite simula- 
tions we use the former to estimate the ex- 
pected mean and variance of the power-law 
sample TP-statistic. 

We estimate the significance of the TP-statistic 
for a given sample as 

(f - {r) TP )/{a T ) TP . (9) 

A spectrum with flux suppression in the tail 
(like that in cq.4) will result in a positive 
significance [6]. We note from [5] that (cr T } TP ~ 
JV-Va( 7 -i)-2. 

In Figure 2 we illustrate the behavior of this 
statistic when applied to a distribution with 
suppression in the tail. Using eq. 4 we ana- 
lytically calculate % h = 1 + (ln(x/x min )) FD 
(lower left) and r = (ln(x/x m i n ))p D — 
0.5(ln 2 (x/x mm )) FD (upper right) with 7 = 
2.75, loga; c = 1.0, for three choices of w c and 
as a function of x min . We also calculate the 
expected value (and deviation) of these quanti- 
ties when applied to a data set containing 3500 
events, drawn from a pure power-law with val- 
ues greater than 1.0 For each x m in we estimate 
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Figure 2: The TP-statistic is sensitive to flux suppression for these toy distributions, see text for 
explanation. 



the number of events N with value greater than 
Xmin as 3500x^~^. The upper left panel shows 
the p.d.f.'s (on a log-log scale) normalized to 
unity on [1.0, oo). The lower right contains the 
significance of the TP-statistic; for the lowest 
Xmin (i- c - N = 3500) the model cut-off distri- 
butions can reject the power-law assumption 
at the ~ 4ct confidence level. 

A Likelihood Ratio Test 

Here we introduce a likelihood ratio test de- 
signed to discriminate a model signal (power- 
law with a cut) from a background (pure 
power-law) hypothesis and to be weakly depen- 
dent on 7. We may write the natural log of the 
ratio of the signal likelihood L FD — Y\f FD (xi) 
to that of the background L x = Y[ fx(xi) as, 

R(j,logx c , w c ) = Nln{C(y,logx c , w c )} 

We note that C = B/A (sec cqs. 4 and 1) 
contains the only dependence on 7 and is in- 
dependent of the data points under study, i.e. 



R contains no term involving logo;" 7 . Indeed, 
for any given logx c and w c , the quantity InC 
is linearly dependent on 7 with slope ~ 0.125. 
In this sense the ratio test is weakly dependent 
on 7. However, in order to evaluate the effi- 
ciency of this test to reject a particular power- 
law background in favor of the cut-off signal 
we must choose 7 a priori. 

To illustrate how this test could be applied to 
a CR data set we generate 3500 "toy" events 
from f FD with input parameters 7 = 2.75, 
\ogx c = 1 and w c = 0.1 (see Figure 3). With 
the a priori choice of 7 = 2.75, we then cal- 
culate i?(2.75, \ogx c , w c ) by scanning over the 
ranges 0.03 < w c < 0.17 and 0.93 < loga; c < 
1.07. The maximum lni? mQX = 81.83 gives us 
the fit parameter estimates logi c = 0.97±0.04 
and w c = 0.10 ± 0.03, where the 68% confi- 
dence interval is approximated by the contour 
\nR maai -hiR{2.75,logx e ,w c ) = 2.30/2. 
By simulating Nb g = 10 4 sets of 3500 back- 
ground events drawn from a pure power law 
(with 7 = 2.75) and performing the same pa- 
rameter scan over logx c and w c , we can esti- 
mate the efficiency (3 of this test to reject the 
power-law in favor the toy cut-off model, i.e. 
(3 ~ N lnR >i nRma jN bg . From the right panel 
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Figure 3: The ratio test is sensitive to flux suppression for this MC set, see text for explanation. 



of Figure 3 we note that none of the 10 4 back- 
ground sets have lni? > lnR max ; we can reject 
the power-law in favor of the model cut-off at 
the ~ 4cr confidence level. 

When applying this test to a real CR data set 
7 is not known a priori and one would want 
to estimate it. Studies of the ratio test with 
this extra degree of freedom are currently un- 
derway. 

Conclusions 

We began this note by verifying that the log- 
binned spectral index estimator has more bias 
and a larger error than the un-binncd (maxi- 
mum likelihood) estimator. We then detailed 
two un-binncd statistical tests sensitive to flux 
suppression. We show that both tests show 
high sensitivity for rejecting the power-law hy- 
pothesis in favor of a toy flux suppression 
model and depend only weakly on the true 
spectral index. Applying these tests to 3500 
events drawn from a toy cut-off distribution 
(see eq. 4) we can reject the power-law model 
in favor of the cut-off model at a confidence 
level ~ 4 standard deviations. 
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